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Abstract 

“:i^Approximation  methods  Lur  the  ■■  iniiuu";  averacje  cost  per  unit  time 
problem  with  a controlled  diffusion  )i''ch';i  is  treated.  In  order  to 
work  with  a bounded  state  space,  " •fca*  reflecting  diffusion 

model  of  Strook  and  Varadhan^  although  other  models  can  also  be 
treated.  The  control  problem  is  approximated  by  an  average  cost 
per  unit  time  problem  for  a Markov  chain,  and  weak  convergence 
methods  are  used  to  show  convergence  of  the  minimum  costs  to  that 
for  the  optimal  diffusion.  The  procedure  is  quite  natural  and  al- 
lows the  approximation  of  many  interesting  functionals  of  the 
optimal  process . ^ 

1.  Introduction.  In  this  paper,  we  develop  an  approximation  and 
computational  approach  to  a particularly  difficult  class  of  sto- 
castic  control  problems.  The  computational  problem  leads  to  the 
approximation  of  the  original  process  and  optimization  problem  by 
an  interesting  and  simpler  sequence  of  processes  and  optimization 
problems,  which  yields  much  information  on  the  original  optimal 
process . 

Let  w(*)  denote  an  R^-valued  Wiener  process,  let  ^ denote  a 
compact  set  and  define  the  bounded  and  continuous  functions 
f ( • , • ) : ; k ( • , • ) : r’^  ^ R;  a ( • ) : R^  r x r 

matrices.  Let  x(-)  denote  a non-anticipative  solution  to  the 
Ito  equation 

(1)  dx  = f(x,u)dt  + o(x)dw, 

where  u(-)  is  a non-anticipative  (always  with  respect  to  w{-)) 
^-valued  progressively  measurable  control  function.  For  typo- 
graphical simplicity  we  sometimes  write  x^  for  x(s),  etc.. 
Define  Y^(*)  by 

(2)  Y^(x)  = lim  i k(Xg,Ug)ds, 

where  denotes  the  expectation  when  Xq  = x and  control  u(*) 

is  used. 

We  are  interested  in  finding  good  approximations  to  the  infimum 

7 of  Y^{x)  over  all  controls  u(*)»  and  to  the  optimal  control, 
and  also  other  information  concerning  the  optimal  trajectory,  in 
cases  where  y'^(x)  does  not  depend  on  the  initial  state  x. 
Furthermore,  we  want  to  be  able  to  compute  the  approximation  and 


* I 


obtain  the  addition.''  nat  Dn  ' \ ing  practical  computationa' 

methods . 

A number  of  difficulties  stand  in  'he  way  of  a practical  computa- 
tion. First,  the  state  space  o:  x(*)  is  unbounded  and  the 

control  problem  (1)  - (2)  will  have  to  be  modified  so  that  the 
state  space  is  bounded.  This  is  a particularly  ticklish  point, 
since  we  want  a modification  which  yields  usable  information  con- 
cerning the  original  problem.  In  particular  situations,  a great 
deal  of  attention  must  be  devoted  to  this.  For  definiteness,  we 
use  the  bounded  process  defined  in  Section  4,  although  many  others 
are  possible.  Next,  we  have  not  assumed  very  much  about  the  system 
(1) . If  Y^(’)  actually  depends  on  x,  then  very  little  is  known 
about  the  problem.  Fortunately,  for  many  problems  (perhaps  the 
most  important  ones)  we  can  restrict  attention  to  u(‘)  which  are 
stationary  (u(-)  is  a stationary  process),  or  to  the  stationary 
pure  Markov  case  (where  u^  = u(x^)).  Even  then,  the  solution  to 
(1)  may  not  be  unique.  In  practical  problems,  it  is  often  demanded 
that  the  system  have  a certain  robustness.  Criteria  such  as  (2)  are 
of  interest  when  the  system  is  to  operate  over  a long  period  of 
time,  usually  of  uncertain  duration  and  with  an  uncertain  initial 
condition.  It  is  usually  desired  that  the  control  be  stationary 
pure  Markov,  and  that  for  the  controls  u(')  in  the  class  which 
are  to  be  considered  there  be  an  invariant  measure  and  the 

measures  of  x(t)  tend  to  as  t " for  each  x = Xq.  In 

certain  cases  (e.g.,  Kushner  [1])  one  can  restrict  attention  to 
such  controls.  In  general,  little  is  known  about  the  continuous 
parameter  problem,  and  many  of  the  difficulties  in  the  way  of 
establishing  convergence  of  a computational  procedure  are  due  to 
this.  Also,  it  is  usually  hard  to  approximate  problems  over  an 
infinite  time  interval,  unless  the  approximation  and  limit 
processes  are  stationary.  Furthermore,  the  ergodic  subsets  for 
each  approximation  may  depend  on  the  approximation.  In  any  case, 
the  procedures  to  be  developed  here  are  very  natural,  provide  much 
information,  and  do  give  the  desired  results  under  broad 
conditions.  We  will  later  make  an  additional  assumption  on  the 
system. 

Our  approach  follows  the  ideas  in  Kushner  [2],  [3]  and  Kushner 
and  DiMasi  [4] . The  problem  (1) , (2)  is  approximated  by  a control 
problem  on  a Markov  chain  (with  approximation  parameter  h) , and 
weak  convergence  methods  are  used  to  show  that  certain  interpola- 
tions of  the  sequence  of  approximating  chains  converge  weakly  to  an 


u[..  process.  The  ti'  M' > ' icJ  s t'  oi  it  dc.il  of  information  on 

tho  optimal  process;  e.n.,  invari  nt  in'  i'-ares  and  joint  distributions. 

A formal  dynamic  programming  approa'.h  'o  the  optimization  of  (1), 

(2)  is  given  in  Section  2,  Section  3 a'cjues  for  a "computational 
approximation"  and  a bounded  state  space.  The  actual  form  of  the 
bounded  state  space  model,  the  Strook-Varadhan  model  of  a reflected 
diffusion  [5],  is  discussed  in  Section  4.  This  model  is  used  partly 
for  the  sake  of  specificity  and  partly  because  it  allows  us  to 
illustrate  some  interesting  features  of  the  weak  convergence  and 
boundary  time  scaling.  The  actual  discrete  state  model  is  developed 
in  Section  5 and  Sections  6 and  7 give  the  weak  convergence  results. 


2 . A Dynamic  Programming  Sufficient  Condition  for 
Optimality  for  (1) , (2) . 

Let  denote  the  differential  generator  of  (1)  : 


a(-)  = o(-)o(*)  '/2. 


When  evaluating  ( • ) at  t,w,  for  a C (R  ) function  F(*), 

2 r 

set  X = x^,  u = u^.  Suppose  that  there  is  a C (R  ) function 
V(«)  and  a constant  Y such  that 


(3) 


inf  [i/’'^V(x)  + k(x,a)  - y]  = 0, 
ae'2{' 


( 

where  is  now  treated  as  a parametrized  operator.  If  there 

IS  a Borel  function  u(*)  on  R such  that  u = u(x)  minimizes 
at  X in  (3)  for  each  x e R^,  and  to  which  there  corresponds  a 
process  (1)  such  that  e'^V(x  )/t  0,  then 


(4a) 


= lim  — 

t->a>  ^ 


[ k(x  ,u  )ds. 
X J-  ' s'  s 


If,  in  addition,  v(*)  is  any  ^-valued  non-anticipative  {w,t) 
progressively  measurable  function  (henceforth  called  a control) 
corresponding  to  which  there  is  a solution  to  (1) , and  if 

i e''v(x.  ) -►  0,  then 

t X t 


(4b) 


Y < lim  i E^ 
t-^oo  ^ 


ft 

k(x  ,v  ) ds, 

Jo  ® ® 
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and  u(*)  is  optiniit ' ■ • ■ esi  ct  » •^^uch  '' { • ) in  the  sense  thi" 

< y'^  for  any  Xq  •"‘her  fi  -.ed  '>>  i andom.  Under  u(*)  or  v(-!, 
(1)  is  homogeneous,  but  there  is  no*  necessarily  a unique  invariant 
measure. 

3.  Bounded  State  Space  Approximations.  The  approximation  and 
computational  method  developed  in  [2]  is  roughly  as  follows.  Let 
u{*)  be  fixed,  and  let  it  be  a function  only  of  the  state  x.  We 
derive  a family  (parametrized  by  h)  of  Markov  chains.  For  fixed 
u(*)»  the  sequence  of  (suitable)  continuous  parameter  interpola- 
tions of  the  chains  converge  weakly  to  the  solution  to  (1) , as 
h ->•  0,  under  broad  conditions.  For  each  h,  we  have  a controlled 
(indexed  by  u(»))  family  of  Markov  chains.  Optimize,  using  the 
appropriate  Markov  chain  version  of  (2) , and  obtain  the  minimum 
value  function  for  each  chain. As  h -*•  0,  the  sequence  of  minimum 
values  converges  to  the  infimum,  over  a large  class  of  comparison 
controls,  of  the  value  function  of  the  original  problem.  Also, 
many  properties  of  the  approximations  converge  to  similar 
properties  of  the  limiting  optimal  process. 

Since  our  interest  is  in  feasible  computations,  as  well  as  in 
convergence,  it  is  necessary  that  for  each  h the  state  space  of 
the  approximating  chain  be  finite.  This  requirement  necessitates 
revision  of  the  original  system  (1) . The  following  are  among 
several  possibilities  that  can  be  dealt  with. 

(i)  The  state  space  may  be  naturally  bounded,  in  that  there 


are 

bounded  sets 

Gq,G^  such  that  if  Xq  ^ Gq' 

Xt  e Gj^  for 

all 

t 

and  all 

u ( • ) . 

(ii) 

If  Xq 

e Gq,  then  the  approximating  Markov 

chain  remains 

in 

^1' 

for  all 

h,  under  the  optimizing  controls. 

(iii) 

Impulsive  control  terms  ([2],  Chapter  8)  are  added  to  the 

cost  function,  such  that  the  state  is  guaranteed  to  be  "impulsively" 
driven  into  G^,  if  it  ever  leaves 

(iv)  A bounded  set  G can  be  introduced,  such  that  x^  is 
not  allowed  to  leave  G = G + 9G.  To  guarantee  this,  a suitable 
boundary  process  is  introduced  on  3G. 

For  concreteness  in  the  development,  a particular  form  of  (iv) 

will  be  dealt  with.  We  let  G be  a hyper-rectangle  and  suppose 

that  x^  is  reflected  from  9G.  A hyper-rectangle  is  chosen  only 

to  simplify  the  specification  of  the  approximation  on  the  boundary. 

Any  region  for  which  a specification  with  the  proper  convergence 
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pioi-orties  exists  can 


■i'  -'  n. 


4.  The  Submartingale  Problem  of  Strou);  .md  Varadhan  [5]  in  C. 

In  order  to  assure  ourselves  that  the  reflection  is  well  defined, 
assume 


(Al)  for  each  i,  a^j^(x)  is  strictly  positive  on  the  boundary 

planes  of  G which  are  parallel  to  {x;  x^  = 0}, 

t h ^ 

where  x^  = i^  component  of  x. 


We  introduce  a boundary  control  and  cost  function.  Let  be 

a compact  set,  and  define  the  bounded  continuous  functions 

Y(-,'):  3G  X -V  R^;  kQ(- ,•)  : x <2^^  ->  r p ( • ) : ac  ^ [0,1]. 

Let  the  vector  y(x,u)  with  origin  x point  strictly  interior 

to  G for  each  x t 3G  and  ot  t ‘2'^.  For  A C R^,  set 

I (x)  = indicator  of  set  {x:  x e A} , let  x(')  denote  the  generic 

element  of  C [0,“)  (R  -valued  continuous  functions  on  [0,®))  as 

well  as  the  solution  to  (1) . Hopefully,  no  confusion  will  arise. 

Define  = C^[0,“)  Pi  {x(*):  x.  £ G,  all  t < “}  and 

^ r* 

y = o-algebra  on  C_  induced  by  the  projections  x , s < t.  For 
t G S * 

this  reflecting  diffusion, admissible  controls  u(-)  are  ‘H'-valued 
when  the  process  state  x^  eSG,  and  are  '^^-valued  when  the  process 
state^''’x^  £ aG.  For'*’  q(*,*)  £ C^'^(G  x [0,“))  and  admissible 

u(*),  define  the  function  ( • , • ) on  C^(0,«>)  by 

q (j 


(5)  Fg(x(-),t)  = q(x^,t)  - q(XQ,0) 


ft 


+ i^"^]q(Xg,s)  I^(x^)ds. 


For  the  moment,  let  u{*)  depend  only  on  the  current  state  x. 

u r 

Suppose  that  for  some  y £ G,  there  is  a measure  P on  such 

u 2 1—^^ 

Py{xQ  = y}  = 1 and  for  each  q(-,*)  in  C ' (G  ^ (0,®))  for  which 

P(x)q^(x,t)  + Y' (x,u (x) )q^ (x,t)  > 0 for  all  x £ 3G,  and  all  t > 0, 

the  process  {F^  ( • , t ) , ^ , P^}  is  a submartingale.  Then  P*^  is 
q t y y 

said  to  solve  the  submartingale  problem  for  initial  value  y.  If, 

in  the  above,  the  vector  y can  be  replaced  by  a measure  '^q  G, 

and  P^  {x^  e r}  = for  each  Borel  set  T,  then  P^  is  said 

0 “ 0 
to  solve  the  submartingale  problem  for  initial  measure 

If  u(*)  depends  only  on  the  current  state  x,  then  the  solution 

'*’21 

C ' is  the  set  of  uniformly  bounded  continuous  functions  on 
G X [0,“)  whose  derivatives  up  to  second  order  in  x and  first 
in  t,  are  continuous  and  uniformly  bounded. 

++  and  u^  is  measurable. 


to  the  submartingale  problem  gives  the  desired  reflected  diffusion, 
and  v(x,u(x))  is  the  average  "direction  of  reflection"  at  x c 'OG, 
and  P(x)  is  a scale  factor  which  determines  the  relative  time  that 
x(»)  spends  on  3G  ([2],  [3],  [5]).  Since  P{-)  only  affects  the 

time  scale,  and  not  the  costs  {[3],  [2],  Chapter  10),  for  our 

modelling  purpose  it  is  sufficient  to  set  P(x)  = 1,  which  we  will 
do. 

Let  solve  the  submartingale  problem.  There  is  a non- 

decreasing scalar  valued  process  P('),  which  only  increases  when 
x^  e 9G,  and  is  such  that  for  the  above  q(*,") 


(6] 


rt 

^“(x(-),t)  - ! 


.^[q^(x^,s)  + r (x^,u^)q^(x^,s)]dp^ 


is  a martingale  (with  respect  to  }).  Furthermore,  there 

+ ^ ^ u 

is  a standard  Wiener  process  w(-)  such  that  under  P^, 

(x ( • ) , u ( • ) ,P  ( * ) ) are  non-anticipative  with  respect  to  w(-)  and 

w. p. 1 . 


(7) 


X, 


y + 


ft 

f (x  , u ) I„ (x  ) ds  + 
^ S S G s 


t 

o(x^)ij,(x^)aw^ 


For  the  control  problem,  we  may  wish  to  deal  with  a larger  class 
of  (admissible)  controls  than  the  stationary  pure  Markov  class. 

We  can  still  speak  of  a solution  to  the  submartingale  problem,  )3ut 
then  the  measure  P^  or  P^  must  be  defined  on  the  appropriate 

^ 0 

o-algebra  on  the  product  space  of  C„  and  the  path  space  for  the 

O 

control  process.  If  this  extended  submartingale  problem  has  a 
solution,  then  the  non-decreasing  process  y(*)  and  Wiener  process 
w(*)  will  still  exist  and  (6),  (7)  hold. 


A modified  control  problem.  Suppose  that  there  is  a solution  to 
the  submartingale  problem  corresponding  to  admissible  control 
u(*),  and  initial  condition  y.  Define  Y^(y)  now  by 


(8) 


Y^(y) 


= lim  — 
t->-® 


E^{ 

y 


klx^.u^jloU^jds 


Yo<’‘s 


^To  construct  the  Wiener  process  w(*)»  we  may  have  to  augment  the 
probability  space  by  adding  an  independent  Wiener  process. 


The  formal  dynamic  programmm.a 


Since  P = 1,  we  can  set  p = s. 

s 

equation  (3)  is  replaced  by 


inf  [i^^V(x)  + k(x,u)  - 7]  = 0,  x e G, 
a 

(9) 

inf  [V'{x)Y(x,a)  + k^{x,a)  - = 0 , x c i)G, 

a ^ ^ 


where  V{’)  is  now  assumed  to  be  bounded.  If  there  is  a solution 
to  the  submartingale  problem  corresponding  to  admissible  control 
v(*)  and  initial  condition  y,  and  also  a smooth  function  V(*) 
and  constant  y solving  (9) , then 


(10) 


■y 


, V 
I 


(y)  • 


If  there  is  a Borel  admissible  control  u{-)  which  attair.s  the 
infimum  in  (9) , and  for  which  the  submartingale_proklem  has  a solu- 
tion for  each  initial  condition  x,  then  y = y'^(y)  and  u(‘)  is 
optimal.  We  emphasize  that  although  (9)  will  serve  as  the  basis  of 
our  approximation,  it  need  not  have  a solution  of  any  sort  for  our 
method  to  be  valued. 


5.  Discretization.  There  are  a number  of  techniques  for  getti.ng  an 
approximating  sequence  of  Markov  chain  control  problems  v/ith  the 
correct  convergence  properties.  We  use  the  method  in  [2]  mainly 
because  it  is  relatively  straightforward,  fairly  well  understood 
and  we  can  refer  to  e.xisting  results.  The  method  is  based-on  a 
finite  difference  approximation  with  difference  interval  h.  A 
particular  (but  natural)  finite  difference  approximiation  to  (9)  is 
used.  It  makes  no  difference  whether  or  not  (9)  has  a smooth  solu- 
tion, for  the  finite  difference  approximation  is  not  used  to 
solve  (9) . After  a suitable  rearrangement,  the  coefficients  of 
certain  terms  in  the  finite  difference  approximation  will  be 
transition  probabilities  for  an  approximating  controlled  .Markov 
chain.  This  is  the  only  use  to  which  (9)  will  be  put.  The  method 
gives  us  an  approximating  chain  simply  and  automatically.  A 
detailed  outline  of  the  method  and  of  some  of  the  convergence 
properties  will  be  given,  but  many  of  the  details  which  can  bo 
found  in  the  basic  references  [2],  [3],  [4]  will  be  om.itted. 

Let  e^  = unit  vector  in  i^^  coordinate  direction,  and  assume  for 
convenience  that  each  side  of  G is  an  integral  multiple  of  h. 


Le^  denote  the  finite  difference  grid  on  G,  and  set  <>G^  • 

G,  - G,  , where  G,  is  the  finite  difference  iirid  on  G,  Now,  lot 
h h h 

us  discretize  (9).  On  SG,  use  the  approximation 


V (x)  ->■  [V(x+e.h)  - V(x)]/h,  if  > 0 


(11) 


V (x)  [V(x)  - V(x-o.h)]/h,  if  Y.(x,a)  < 0, 


X . 
1 


In  G,  use  the  approximation 


(12: 


X . 


X . 
1 


V 


X . X . 
1 1 


(x)  [V(x+e^h)  - V(x)]/h, 

if 

f ^ (x,u) 

> 0 

(x)  ' [V(x)  - V(x-e^h)]/h, 

if 

f^  (x,a) 

< 0 

(x)  [V(x+e,.  h)  + V(x-e^h) 

- 2V(x) ]/h^. 

The  approximations  for  V\  (x) , i / j,  are  long,  and  the  reader 

XfXj 

is  referred  to  [2],  Chapter  6.2  for  one  set  of  possibilities. 

Simply  to  avoid  writing  these  down  here,  we  suppose  that  o(x)o’(x) 
is  diagonal.  This  assumption  is  not  required  by  anything  except 
our  current  laziness.  It  does  not  affect  the  outcome,  only  the 
precise  form  of  the  functions  ( • / • ) and  p”  ( • , • ) introduced 
below. 

y. 

Define  Qj^(x,-),it  (x)  and  *>^j^(^)  t»y 

Qh(J<'^)  = h I ■|f^(x,a)|  + I c^(x),  X t.  Gj^, 

Qj^(x,a)  = ^ |Y^(x,a)l  , x c 

(x)  = sup  (x,u)  , 


(where  a ranges  over  the  appropriate  set  or  , 


At^(x) 


h/Qj^(x)  on  9Gj^, 

2 — 

h /Qj^(x)  on  Gj^. 


Approximating  the  derivatives  in  (9)  by  (11) -(12)  and  rearranging 
terms  yields  the  follow'ing  equation,  where  and  V (•)  are 

used  to  denote  the  solution  to  the  discretized  equation  and  we  use 
the  definitions  g^(x)  = max[g(x),0]  and  g (x)  = max [0 , -g (x) ] . 


2-h 


•h 


,h 


[13)  h ) = inf  [-Q.  (x,u)v  (x)  f I v"  (x±  e . h)  (hf  T (x , -v ) + or(x//2: 


1,± 


+ h k (x,'J  ) ] , X t G.  , 

h 


h't^  = inf  [-Q,  (x,a)  v^(x)  + I (x±e  . h)  (x  ,ci ) + hk.  (x,a)], 


1 1 


X t-  3G,  . 
n 


h ' h — 

Define  p (x,x-'e.hUO  = (coefficient  of  (x±e  . h)  ) /Q,  (x)  , 
u ^ in 

p (x,x|-*)  = [Qj^(x)  - Q^(x,^)  ]/Q^(x)  . Divide  (13)  through  by 


Q^(x)  and  rearrange  to  get 


(14)  V^^(x)  + 'i^At'^(x)  --  in^'  [ ^ ( x-*  e . h)  p^' (x , x±o  . h j ^'^ ) 

i,±  ^ ^ 


+ V^(x)p^^x,x|a)  + k(x,u)At^(x)  ] , x e G^, 

h 


and  similarly  for  x in  T-G,  , whore  ^ and  k are  replaced  by 


and  k^,  rosp.  Define  p (x,y[a)  = o for  all  x,y  other 
than  y = X or  y = x i e^h  for  some  i.  Then  [ p^  (x , y ! “^ ) , x, 
y ^ Gj^}  is  a transition  probability’  for  a controlled  Markov  chain. 


Let  denote  the  random  variables  of  the  chain,  and  define 


^(x)  = in  G,  and  ‘2'(x)  = ‘2'^  on  3G,  and  redefine  k(x,ct) 


to  equal  kQ(x,a)  for  x e 3G.  Then  (14)  can  be  rewritten  in  the 


form 


(15)  V^(x)  + Y^At'^(x)  = inf  [E^V^(4^^)  + k (x,a)  At^  (x)  ] , x t G^  . 

aL‘2'(x)  ^ ^ h 


In  (13)-(15),  we  supposed  that  is  a constant.  T.his  is  almost 


equivalent  to  the  assumption  that  there  is  only  one  recurx'once 
class  for  the  chain  under  the  optimal  control.  If  there  is  m.ore 
than  one  recurrence  class,  the  numerical  problem  is  harder.  Let  us 
henceforth  assumie 


(A2)  For  each  small  h and  under  each  stationary  pure  Marko\ 
control,  there  is  only  one  recurrence  class. 


II 


This  assumption  seems  to  hold  in  very  many  cases  of  practical 
interest.  It  can  be  dispensed  with,  but  then  the  problem  of 
actually  solving  (13) -(15)  is  much  harder.  Under  (A2) , (15)  can  be 

solved  by  either  Howard's  iteration  in  policy  space  for  semi-Markov 
processes,  or  by  a version  of  the  backward  iteration  method  for  the 


1 


t^^/orage  cost  per  unit  time  probj.cm  (sec,  e.cj.,  Sciweitzor  ana 


Federgruen  [8],  but  adapted  to  a semi-Markov  process  model).  There 
is  an  optimal  stationary  pure  Markov  control  u^ ( • ) for  all  small 
h,  it  is  the  minimizer  in  (15) , and  it  is  optimal  with  respect  to 
all  controls  for  the  discrete  problem.  The  "Semi-Markov"  point  v.'il 
be  returned  to  below.  The  optimal  solution  is  given  in  the  first 
line  of  (19 ) . 

Discussion  of  (14)  . For  V ^ have  for  any  stationary  pure 

Markov  control  u(') 


(16a)  ~ ^ ^ used]  = f (y , u (y)  ) At^  (y)  , 

cov^,  y,  u(-)  used]  = c (y)  o'  (y)  At^  (y)  ■) 

+ o (At^  (y)  ) , >:  t Gj^. 

For  y e 

Eyt^nVl  “ "’n'^n  " used]  = (y , u (y ) ) At'"' (y ) , 

(16b) 

'^°^y^^n+l  ” ‘^n'^n  ""  ^ ‘ ' used]  = o(At*'‘(y)). 


These  " inf  initesim.al " properties  (derived  in  [2],  [3]),  together 

with  (15),  suggest  a close  relation  between  the  controlled  chain, 
and  the  controlled  reflected  diffusion. 

These  relations  are  brought  out  quite  clearly  when  the  chain  is 
suitably  interpolated  into  a continuous  parameter  process,  and  (15) 
(16)  suggest  several  useful  interpolations.  First,  v.’e  note  that 
solving  (15)  is  the  only  computation  that  need  be  done.  Equation 
(15)  is  not  quite  the  dynamic  programming  equation  for  the  average 
cost  per  unit  time  for  the  controlled  chain  ' since  y‘  has 

a state  dependent  coefficient  At*^(’).  However,  it  is  the  dynamic 
programming  equation  for  a semi-Markov  process  or,  equivalently 
for  the  types  of  continuous  parameter  interpolations  which  are 
discussed  below. 

Let  ■n'*  denotesthe  invariant  measure  which  corresponds  to  the 
optimal  control.  Henceforth,  unless  otherwise  mentioned,  (y..  } 
refers  to  the  optimal  chain,  with  initial  measure  n . 

We  now  choose  an  interpolation  method  and  show  that  the  sequence 
of  interpolated  processes  converges  weakly  to  a solution  to  the 
submartingale  problem  corresponding  to  some  admissible  control 


u(‘);  and  that  this  solution  is  an  optimal  one,  with  cost  rate 

> = lim  Y^. 
h-^0 

Either  of  the  following  two  piecewise  constant  interpolations  v/ill 
serve  our  purpose. 

Interpolation  1.  Define  At^('^*?)  = At^,  t^  = 7 At^‘.  Define  the 

^ ^ ^ i=0  ^ 

li  ^ h h h 

semi-Markov  process  5 (•)  by  ^ (t)  = on  [t  ,t  This 

interpolation  was  used  in  [2],  [3]. 

Interpolation  2 . Let  denote  the  Markov  jump  process  on 

G.  defined  bv: 
h 

If  C^(t)  = y,  then  the  average  additional  time  spent  in  state  y 
before  a jump  is  At  (y) , and  P{next  state  = y' 1 current  state  = y; 
= P^  (y / y ' I (y)  ) • There  is  a slight  ambiguity  here  since  it  is 
possible  that  p^  (y , y | u^^  (y)  ) > 0.  But,  this  should  rause  no  con- 
fusion - for  it  simply  means  that  there  is  a jump  of  "zero" 
magnitude.  The  average  inter jump  times  can  be  normalized  to  avoid 
this,  but  it  hardly  seems  worthwhile.  Note  that 

P{jump  in  (t,t+A]  (t)  = y}  = (A/At^(y))  + o(A). 

This  interpolation  is  developed  in  Section  8 of  [4]. 

Neither  interpolation  is  always  preferable  to  the  other.  Inter- 
polation 2 could  have  been  used  in  references  [2],  [3],  but  there 

did  not  seem  to  be  a need  for  it  then.  There  are  advantages  to 
having  an  interpolation  which  is  a continuous  parameter  Markov  chain 
in  that  certain  concepts  (such  as  stationar ity)  have  a clearer 
meaning;  on  the  other  hand  it  is  sometimes  preferable  to  work  with 
interpolation  tim.es  that  are  deterministic  functions  of  the  current 
state,  since  then  there  are  fewer  random  variables  to  worry  about. 
The  limiting  processes  (see  Sections  6 and  7)  are  the  same  for  both 
interpolations.  In  Case  2,  the  average  sojourn  time  in  a state  y 
(before  the  next  jump,  whether  of  zero  value  or  not)  is  At  (y) , 
precisely  the  interpolation  interval  for  Case  1.  In  both  cases,  the 

time  spent  at  a state  y on  the  boundary  (0(h),  per  sojourn)  is 

2 

greater  than  time  spent  at  a state  y in  (0(h  ) per  sojourn, 

unless  there  is  the  complete  degeneracy  o(y)  = 0).  This  property 
is  a consequence  of  our  definition  of  At  (y)  for  y e 3Gj^ 


(to  correspond  to  P(y)  - 1). 

For  either  Interpolation  1 or  2 , 


(17) 


= lim  eJJ  ■ 

t->-ou 


k (f;^,u^)ds/t, 


where  = u”(C")/  and  E indicates  that  u is  used. 

5 S X 

variant  measure  for  either  interpolation  is  )a  , whore 


The  in- 


(18a) 

Also , 


Vi^(y)  = At^(y)7:^(y)/I  At^(z)ii^(z) 


(18b) 


- I u^\y)k(y,u^(y)  ) . 


Equations  (17)  and  (18)  are  not  hard  to  verify.  For  example, 

(18)  follows  from  the  ergodic  theorems  for  Markov  chains  (see 
Chung  [6],  Section  1.15,  Theorems  1,  2,  3;  see  also  [2], 

Chapter  6.8,  for  similar  calculations).  It  can  also  be  obtained 
by  direct  verification  of  the  Kolmogorov  equation  using  the  in- 
variance  of  ^'^(•)  for  the  discrete  parameter  chain.  To  get  (1?) 
write  u^  for  u^(i*?)  and  use  (15)  and  the  same  ergodic  theorems 


to  get 

(19) 


= lim  lE^ 

n-vco  1=0  1=0 


n 

(w. p. 1) 


lim  [ I k(^^,uJ)AtJ/  I Atl'] 
i=0  ^ ^ 


n-1 

I ^ 

i=0 


lim 

n 

(w. p. 1) 


n 


■rh  ..h, 


k(^  ,u  )ds/t  = lim 
0 (w.p.l) 

t-KO 


k(^^,u^)ds/t 


= lim 

t-voo  J 


E!jk(^^,u^)ds/t. 


Similarly,  the  first  limit  in  (19)  equals 


= I T’^(y)k(y,u^(y)  )At^(y)/  I (y ) A t*^  (y ) 


= I M^(y)k(y,u^(y)  ) 

y 


(20) 


-h 

Y 


Let  v(*)  denote  a stationary  i ure  Markov'  control.  'J'hen  (15) 

h 1*1 

implies  that  (here  refer  to  the  variables  under  control 

V ( • ) ) for  any  x 


(21) 


lim 

n-M« 


n-1  , , i. 

„v  ^ ,.h,  ,-h  ,.h, 

E y At . k (£,  . , V (ii  . ) 

X 1 1 1 

e"'  At^' 

^ i=0  ^ 


1 


V , h 


The  proof  of  optimality  of  u^^(*)  with  respect  to  any  control  which 
is  not  necessarily  stationary  pure  Harkov'  can  be  based  on  a method 
of  Ross  [7]  and  is  omitted. 


6.  Weak  Conv'erqenco . We  will  work  with  Interi'olation  2,  since  it 
is  a strictly  stationary  process.  The  method  will  be  outlined,  but 
the  proofs  will  be  usually  referred  to  when  already  available 
elsewhere.  So  far,  wo  have  a sequence  of  stationary  pure  Markov 
controls  (u”(-)},  corresponding  stationary  continuous  parameter 
Markov  chains  (4  (•)},  inv'ariant  measures  (t*  },  and  minimum  costs 
— h 

{ Y‘  } , where 


= I_  U^(y)k(y,u^(y)  ) = 1 u'’ (y ) k (y , u^  (y ) ) 


yec 


■’h 


^ G , 


ana 


^ l t’^y)k„(y,u’^(y)  ) , 


ytOG, 


(22) 


^ t = E [,  k{',  ,u  )I  (£,  )ds  + 

j « S S b 


1'^,  ,-h  h.  T - h,  , , 

f A,-  {b„)ds]  , 


0 


‘0  '^s'  s'  9G  '^s' 


where  E^  denotes  the  expectation  under  initial  measure  and 

we  use  u^  = u^(^^).  We  often  write  4^(s)  as  etc.,  for 

s s s 

typographical  simplicity. 

We  obviously  can  write 


^s  ’0 


J 0 


(23) 


where 


B‘^(t) 

,'t  , 

' -r  / - h.  , 

!dc:^‘(s) 

- f(C 

BS(t) 

ft  . 

= J„hG'S> 

[d^^  - 

. , - h 
r (c,^. 

h h. 
# u ) 

s 3 


h h 

Denote  the  two  integrals  in  (22)  by  K (t)  and  Kq  (t) , resp. , 

and  the  first  two  integrals  on  the  right  side  of  (23)  by  Q^(t) 

h in  n 

and  QQ(t),  resp.  Let  D'  [0,«>)  denote  the  .space  of  R'  valued 

functions  on  [0,"),  continuous  on  the  right  and  with  left-hand 

limits  (Billingsley  [9],  Lindvall  [10],  Kushner  [2],  Chapter  2), 

endowed  v;ith  the  Skorokhod  topology.  If  a measure  induces  a 

process  ( • ) with  paths  in  D [0,"')  w.p.]  and  ■ tight, 

we  abuse  terminology  and  say  that  { x‘  (•)}  ii^  tight.  If  ' 

converges  weakly  to  a measure  v and  v induces  a process  X(’) 

with  paths  in  d"^[0,'«)  v.-.p.l,  wo  say  that  {X*^(*)}  converges 

weakly  to  X(*).  We  occasionally  use  Skorokhod  imbedding  ([11], 

Theorem  3.1.1,  or  [2],  Chapter  2),  which  says  tliat  if  X^^  ( • ) - X(') 

ni 

weakly  in  D [0,“),  then  there  arc  processes  X('),X  (•)  with 
paths  in  D^'[0,«’)  and  which  induce  the  same  measures  on  d'^'[0,'*’) 
as  do  X ( • ) , X^^  ( • ) , resp.,  and  are  such  that  X^‘ ( • ) X ( • ) v;.p.l 
in  the  Skorokhod  topology.  Since  all  our  lim.it  processes  will  be 
continuous  w.p.l,  this  implies  that  X^^(t)  -*■  X(t),  uniformly  on 
bounded  intervals.  Also,  we  omit  the  tilde  ^ notation.  The 
following  theorem  follows  from  the  results  in  [4],  Section  8. 


Theorem_.l . ^ [^^^(•),  ( • ) , kJ?  ( • ) / ( • ) , bJ}  ( • ) , ( • ) , ( • ) } = 

5r+2  ^ ^ ^ 

{ *^  ( • ) } } is  tight  on  D [ 0 , '"“ ) , and  all  limits  have  continuous 

paths  w.p.l. 

We  will  next  ch.aracter ize  the  limits  of  ( B ( • ) , Bq  ( • ) } • 

Let  us  choose  a weakly  convergent  subsequence,  also  indexed 
by  h,  and  henceforth  fixed.  The  subsequent  results  will  not  depend 
upon  the  selected  subsequence.  Denote  the  limit  by  4(')»  K(")» 

Kq  ( • ) , B(*),  Bq  ( • ) , Q("),  Qq(*)  • By  construction,  B^(t)  and 


^Theorem  1 docs  not  require  Al  or  A2  and  holds  v^?hether  the  initial 
conditions  are  random  or  not.  It  needs  only  the  boundedness  and 

continuity  of  f,o,k,kQ  and  Y.  Also,  u”  can  be  replaced  by  any 
pure  Markov  control. 


where  ^ (x)  is  such  that  it  converges  to  o(>:)o'  (>:)  as  0, 

h 

uniformly  in  x,  and  sup  E|b  (t)  | ^ ™ for  cad;  t > 0.  Thcji 

h 


{|B'^(t)  1^}}  is  uniformly  integrable  for  each  t.  Let  denote 

the  o-algebra  induced  by  {4  , B ( s) , K ( s ) , K ( s ) , Q ( s) , Q ( s) , s < t}. 

Let  denote  an  e neighborhood  of  oG.  In  [3],  Lemma  1,  it 

is  shown  that  for  each  real  T > 0 there  is  a constant  K such 


that , 
(24) 


for  Interpolation  1 and  snail  e > 0 


uniformly  :!n  u,h  (although  u did  not  appear  in  the  derivation, 
only  an  upper  bound  to  the  values  of  the  drift  function  f was 
used  in  the  derivation).  The  result  (24)  depends  only  on  the  fact 
that  the  component  of  the  diffusion  term  a(x)dw  orthogonal  to  the 
boundary  is  uniformly  non-degenerate  on  TiG;  i.e.  on  (Al). 

Estimate  (24)  also  holds  for  Interpolation  2,  and  is  crucial  for 
the  rest  of  the  development.  It  says  that  neither  the  approxima- 
tions nor  the  limit  can  "linger"  near  (but  not  on)  the  boundary. 

In  particular,  it  implies  that  the  probability  is  zero  that  over 
some  subinterval  of  [0,T]  the  paths  for  the  approximations  will 
be  in  H G and  the  limit  will  be  on  DG. 


Theorem  2 . Assume  Al.  {B(t),^^l 

ft 

with  quadratic  covariation  q^G^^s 


Proof . The  proof,  using  (24),  follows  similar  calculations  in  [2], 
[3],  [4].  Let  q (t)  represent  any  of  the  vectors  in  4 (•) 

(see  Theorem  1),  let  n denote  an  arbitrary  integer,  t^,  i < n, 
numbers  less  than  or  eaual  to  t,  let  s > C and  let  g(*)  denote 


a continuous  real  valued  function  By  weak  convorgence,  fJkorokliod 
imbedding  and  the  uniform  integral .^1  ity  of  {!B^^{t)|}  for  each  t, 
the  result  (martingale  property  oi  I3^'(')) 

E^g(q^(k.),  i < n)[B^-(t+5)  - B^\t)]  - 0 


implies 


Eg(q(t^),  i < n)[B(t+s)  - B(t)]  = 0. 

Also,  the  result 

E\(q^(t^),  i < n)[(B^\t  + s)  - (t)  ) (B^^t+s)  - b'‘(())' 


together  with  the  weak  convergence,  Skorokhod  imbcddii-.g  and  uriform^ 

h 2 

integrability  of  {jB‘(t)|  } and  (24)  implies  that 


Eg (q  Ct^)  , 


i < n)[(n(t+s)  - B(t))(3(t+s)  - B(t))' 


Jo 


I,,  (4,.)  aii  ) c' 


(^,.)  )ds] 


0. 


The  arbitrariness  of  g(-),  t,  t + s,  t^,  i < n,  and  n imply 
the  theorem.  O.E.D. 


We  next  need  a representation  for  Q(‘)»  Qq  ( • ) / !<(•)  and  { • ) . 
It  is  easy  to  see  that  all  these  functions  are  absolutely  continuous 
with  respect  to  Lebesgue  measure.  Thus,  there  are  measurable  (‘^,t) 
functions  q(')/  q^  ( • ) , k(’)  and  such  that,  for  almost 

all  uj,t. 


Q(t) 


t_ 

q (s) ds, 

0 


Qo(t)  = 


qQ(s)ds 


K(t) 


ft_ 

k (s) ds , 

0 


KjCt) 


(s)dE. 


^ u 2 

Actually,  uniform  integrability  of  {|b  (t)|  1 (implied  by 

sun  E^[B^(t) < » is  not  needed.  Since  B(‘)  is  a square 
h 

integrable  continuous  martingale,  its  quadratic  variation  can  be 
obtained  by  a "localization"  of  the  argument. 


U'c  can  now  proceed  iu  two  ways,  either  working  •.-.'ith  generalised 
random  controls  or  by  imposing  a convexity  condition  and  using  an 
implicit  function  theorem.  We  take  the  latter  (and  easier)  approach. 

The  or  any  _3, . ^ Assume  Al  and  A2  . Let  f,k,kQ,Y,c  be  continuous  and 

let  the  sets  {f(x,a),  k(x,a),  a t.  } e g(x,'iJ')  and  M(x,a), 

kp(x,a),  a c *^0^  be  convex  for  each  x.  Then  there 

is  a control  u(*)^  with  values  u in  when  y l G and  in 

-r  ^ . „ ■ S ■ “ ■ S — 

when  4_  t ?G  and  such  that,  for  almost  all  w,t, 

C(t)  = 

fg^t)  = Y ,'U^) 

k(t)  = k(i^,u^) 

Proof . Define  g(t)  = (f(t),k(t))  and  = ( f ^ ( t ) , ( t ) ) . 

The  proof  uses  the  basic  estimate  (24)  and  the  method  of  [2], 
pp.  132-183.  By  (24)  and  [2],  pp.  182-183,  for  alm.ost  all  '.ij  , t 

g(t)  c 


from  v.-.hich  the  result  follows  by  the  Mc3hane-W£\i  f ield  implicit 
function  theorem  as  in  [2],  Theorem  9.2.2.  Q.FbD. 


Summing  up  the  results  of  Thcoromis  1 to  3 , wo  get  the  repre- 
sentation (under  Al  and  A2) 


(25) 


ft  _ _ 


Jo‘^ 


)ds  + I. -((,_)  i (i,u_)ds 


9G  s 


s s 


n(t) , 


where  B(t)  is  a continuous  martingale  with  quadratic  variation 


t 


(t  _)  o (c,  ) o ' (5  ) ds. 
Q o s s s 


^This  control  is  also  non-anticipative  w’ith  respect  to  the  w(*) 
introduced  below  (25)  . 


Also,  there  is  a Wiener  process  with  respect  to  which  all  tl:e 

other  processes  in  (25)  are  non-ar»ticip:'.tive  anci  suc)i  ti’.iat 


B(t)  = 1 ) c ))dw(s).  obvrouslv,  by  the  weal<:  convergence , 

Jo  G s s ^ - t 

is  in  G for  all  t.  Let  denote  the  differential  generator 

associated  with  (25)  in  G.  Dy  a slight  modification  of  the 

argument  associated  with  (10)  and  (41)  in  [3j,  v.'e  can  show  that 

^(•)  solves  the  sub-mart. li. (ja  1 c prohJem. 

Furthermore,  t(*)  is  a stationary  process.  Let  its  invariant 

measure  be  denoted  by  t,  (which  is  the  wealc  limit  of  (t'  })  , and 

let  T = lim  Then  the  distribution  of  4,,  is  y.  By  (22), 

h 

(24)  , 


(26)  Yt  - e''[[  I,,(4  )l<(i  ,ujcir 

J ^ u s S S 


Remarks . The  limit  process  4(‘)  is  stationary,  as  is  the  drift 
£("),  but  wc  have  not  been  able  to  sliow  that  there  is  a Markc. 
(reflecting  diffusion)  process  witii  the  same  dirtributiens.  There 
probably  is  such  a Markov  ;'rocess,  as  there  i^robably  is  a 
stationary  pure  Markov  control  u(-)  such  that  u(4^)  = u('^,t) 
w’.p.].  In  any  case,  our  method  gives  much  information  on  the 
optimal  process  C(-);  o.a.,  the  multivariate  di  stribituions  o!' 
4^1*)  converge  weakly  to  those  of  4(’),  as  do  the  distributions 
of  any  bounded  m.easurable  functional  F(4*^(‘)),  if  F(x(-))  is 
continuous  \:.p.l  with  the  respect  to  the  measure  induced  by  4(‘). 
Indeed,  one  of  the  great  advantages  of  the  weak  convergence  method 
is  that  it  yields  such  information,  in  addition  to  approximations 
to  V.  Also,  Y = av'crage  cost  per  unit  time  for  4(*),  and  is  the 
limit  of  the  average  costs  per  unit  time  for  the  sequence  of 
approximations . 


7.  Optimality  of  the  Limit  4(').  Being  a limit  of  optimal 
approximating  processes,  4(‘)  is  a good  candidate  for  optimality 
for  the  original  optimization  problem  (with  the  reflected  diffusion 
model) . Certain  optimality  properties  are  easy  to  show. 

Theorem_ 4 . Assume  A1  and  A2.  Let  v ( • ) denote  a continuous 
stationary  pure  Markov  control,  such  that  the  corresponding  re- 
flecting diffusion  ^'^  ( • ) is  unique  (in  the  weak  sense)  and  has  a 


unique  invariant  measure 
initial  measure  be  u^) . 


V 


Tl-on  'v 


V 


(where  we  let  the 


Proof.  Lot 


.h 

"n 


and 


j h 


(-)  denote  the  discretized  and  interpolated 
processes,  resp.  , corrcspondir.vq  to  the  fixed  control  v(-}-  Then 
the  cost  y'"  for  the  interpolated  process  is  > bv 


h 


optimality  of  u . Let 

<>h , , 

( • ) . 

. ( • ) and  t 


v,h 


Then 


denote  any  invariant  r.ioasure  foj 


} } 


weaklv  to 


and  the  invariant  measures  fu 

V 


V ,1-1, 


converqe 


resp.  , as  h -*•  0 by  arguments  similar 

to  those  in  Theorems  1 to  2.  The  theorem  follows  from  this  and  (24) 

Q.E.D. 

Since  we  Inwe  not  been  able  so  : n.r  to  prove  tli.it  u(-;  is 
stationer'/  pure  Markov,  it  would  be  nice  to  prove  that  u^')  Is 

optimal  with  respect  to  a broader  class  of  controls  than  those  in 
Theorem  4.  The  class  can  be  broadened,  but  at  the  expense  of  con- 
siderable terminology  and  detail.  We  refer  the  reader  tc  [2], 
where  broader  classes  of  comparison  controls  are  dealt  with  for  a 
number  of  other  tY’'pGs  of  optimization  problem.s. 
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